Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add filters

Language
Document Type
Year range
1.
medrxiv; 2022.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.11.14.22282297

ABSTRACT

The continuing emergence of SARS-CoV-2 variants of concern (VOCs) presents a serious public health threat, exacerbating the effects of the COVID19 pandemic. Although millions of genomes have been deposited in public archives since the start of the pandemic, predicting SARS-CoV-2 clinical characteristics from the genome sequence remains challenging. In this study, we used a collection of over 29,000 high quality SARS-CoV-2 genomes to build machine learning models for predicting clinical detection cycle threshold (Ct) values, which correspond with viral load. After evaluating several machine learning methods and parameters, our best model was a random forest regressor that used 10-mer oligonucleotides as features and achieved an R2 score of 0.521 +/- 0.010 (95% confidence interval over 5 folds) and an RMSE of 5.7 +/- 0.034, demonstrating the ability of the models to detect the presence of a signal in the genomic data. In an attempt to predict Ct values for newly emerging variants, we predicted Ct values for Omicron variants using models trained on previous variants. We found that approximately 5% of the data in the model needed to be from the new variant in order to learn its Ct values. Finally, to understand how the model is working, we evaluated the top features and found that the model is using a multitude of k-mers from across the genome to make the predictions. However, when we looked at the top k-mers that occurred most frequently across the set of genomes, we observed a clustering of k-mers that span spike protein regions corresponding with key variations that are hallmarks of the VOCs including G339, K417, L452, N501, and P681, indicating that these sites are informative in the model and may impact the Ct values that are observed in clinical samples.


Subject(s)
COVID-19
2.
biorxiv; 2022.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.10.10.511571

ABSTRACT

Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences, and then finetuning a SARS-CoV-2 specific model on 1.5 million genomes, we show that GenSLM can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLM represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate the scaling of GenSLMs on both GPU-based supercomputers and AI-hardware accelerators, achieving over 1.54 zettaflops in training runs. We present initial scientific insights gleaned from examining GenSLMs in tracking the evolutionary dynamics of SARS-CoV-2, noting that its full potential on large biological data is yet to be realized.

4.
biorxiv; 2022.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.02.04.479189

ABSTRACT

White-tailed deer (Odocoileus virginianus) are highly susceptible to infection by SARS-CoV-2, with multiple reports of widespread spillover of virus from humans to free-living deer. While the recently emerged SARS-CoV-2 B.1.1.529 Omicron variant of concern (VoC) has been shown to be notably more transmissible amongst humans, its ability to cause infection and spillover to non-human animals remains a challenge of concern. We found that 19 of the 131 (14.5%; 95% CI: 0.10-0.22) white-tailed deer opportunistically sampled on Staten Island, New York, between December 12, 2021, and January 31, 2022, were positive for SARS-CoV-2 specific serum antibodies using a surrogate virus neutralization assay, indicating prior exposure. The results also revealed strong evidence of age-dependence in antibody prevalence. A significantly ({chi}2, p < 0.001) greater proportion of yearling deer possessed neutralizing antibodies as compared with fawns (OR=12.7; 95% CI 4-37.5). Importantly, SARS-CoV-2 nucleic acid was detected in nasal swabs from seven of 68 (10.29%; 95% CI: 0.0-0.20) of the sampled deer, and whole-genome sequencing identified the SARS-CoV-2 Omicron VoC (B.1.1.529) is circulating amongst the white-tailed deer on Staten Island. Phylogenetic analyses revealed the deer Omicron sequences clustered closely with other, recently reported Omicron sequences recovered from infected humans in New York City and elsewhere, consistent with human to deer spillover. Interestingly, one individual deer was positive for viral RNA and had a high level of neutralizing antibodies, suggesting either rapid serological conversion during an ongoing infection or a breakthrough infection in a previously exposed animal. Together, our findings show that the SARS-CoV-2 B.1.1.529 Omicron VoC can infect white-tailed deer and highlights an urgent need for comprehensive surveillance of susceptible animal species to identify ecological transmission networks and better assess the potential risks of spillback to humans.


Subject(s)
Breakthrough Pain , Infections
5.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.07.19.21260808

ABSTRACT

Genetic variants of SARS-CoV-2 have repeatedly altered the course of the COVID-19 pandemic, and disease in individual patients. Delta variants (B.1.617.2, AY.2, and AY.3) are now the focus of international concern because they are causing widespread COVID-19 disease globally. Vaccine breakthrough cases caused by SARS-CoV-2 variants also are of considerable public health and medical concern worldwide. As part of a comprehensive project, we sequenced the genomes of 3,913 SARS-CoV-2 from patient samples acquired March 15, 2021 through July 3, 2021 in the Houston Methodist hospital system and studied vaccine breakthrough cases. During the study period Delta variants increased to cause 58% of all COVID-19 cases and spread throughout the metropolitan Houston area. In addition, Delta variants caused a significantly higher rate of vaccine breakthrough cases (19.7% compared to 5.8% for all other variants). Importantly, only 6.5% of all COVID-19 cases occurred in fully immunized individuals, and relatively few of these patients required hospitalization. Our genomic and epidemiologic data emphasize that vaccines used in the United States are highly effective in decreasing severe COVID-19 disease, hospitalizations, and deaths.


Subject(s)
COVID-19
6.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.02.26.21252227

ABSTRACT

[Abstract]Since the beginning of the SARS-CoV-2 pandemic, there has been international concern about the emergence of virus variants with mutations that increase transmissibility, enhance escape from the human immune response, or otherwise alter biologically important phenotypes. In late 2020, several "variants of concern" emerged globally, including the UK variant (B.1.1.7), South Africa variant (B.1.351), Brazil variants (P.1 and P.2), and two related California "variants of interest" (B.1.429 and B.1.427). These variants are believed to have enhanced transmissibility capacity. For the South Africa and Brazil variants, there is evidence that mutations in spike protein permit it to escape from some vaccines and therapeutic monoclonal antibodies. Based on our extensive genome sequencing program involving 20,453 virus specimens from COVID-19 patients dating from March 2020, we report identification of all important SARS-CoV-2 variants among Houston Methodist Hospital patients residing in the greater metropolitan area. Although these variants are currently at relatively low frequency in the population, they are geographically widespread. Houston is the first city in the United States to have all variants documented by genome sequencing. As vaccine deployment accelerates worldwide, increased genomic surveillance of SARS-CoV-2 is essential to understanding the presence and frequency of consequential variants and their patterns and trajectory of dissemination. This information is critical for medical and public health efforts to effectively address and mitigate this global crisis.


Subject(s)
COVID-19
7.
medrxiv; 2020.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.09.22.20199125

ABSTRACT

We sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently. Virtually all strains in the second wave have a Gly614 amino acid replacement in the spike protein, a polymorphism that has been linked to increased transmission and infectivity. Patients infected with the Gly614 variant strains had significantly higher virus loads in the nasopharynx on initial diagnosis. We found little evidence of a significant relationship between virus genotypes and altered virulence, stressing the linkage between disease severity, underlying medical conditions, and host genetics. Some regions of the spike protein - the primary target of global vaccine efforts - are replete with amino acid replacements, perhaps indicating the action of selection. We exploited the genomic data to generate defined single amino acid replacements in the receptor binding domain of spike protein that, importantly, produced decreased recognition by the neutralizing monoclonal antibody CR30022. Our study is the first analysis of the molecular architecture of SARS-CoV-2 in two infection waves in a major metropolitan region. The findings will help us to understand the origin, composition, and trajectory of future infection waves, and the potential effect of the host immune response and therapeutic maneuvers on SARS-CoV-2 evolution. IMPORTANCEThere is concern about second and subsequent waves of COVID-19 caused by the SARS-CoV-2 coronavirus occurring in communities globally that had an initial disease wave. Metropolitan Houston, Texas, with a population of 7 million, is experiencing a massive second disease wave that began in late May 2020. To understand SARS-CoV-2 molecular population genomic architecture, evolution, and relationship between virus genotypes and patient features, we sequenced the genomes of 5,085 SARS-CoV-2 strains from these two waves. Our study provides the first molecular characterization of SARS-CoV-2 strains causing two distinct COVID-19 disease waves.


Subject(s)
COVID-19
SELECTION OF CITATIONS
SEARCH DETAIL